Memory 10 | January/February 2024 Semiconductor Digest www.semiconductordigest.com And a concurrent action of the Agent would be to regulate WIP flow into a tool or at WIP flow branching. A ={A 1 ,A 2 ……A i }, where A i ={a 1 ,a 2 ……a i }. The Agent function would strive to achieve WIP in wait distributions at tools that do not exceed WIP content double the size of ideal WIP [3]. π t = P (A t = a ןS t = s). In other words, the Agent policy will tend to converge towards an even, minimum WIP content in the tool buffers. Even as it starts out from an unbalanced system state. Timing Encoding neural networks with the actual nodes of a semiconductor dispatch system, and its representation of system states, has demonstrated successful Agent Learning and Agent action for the Semiconductor environment. These demonstrations generally assume a practically instantaneous delivery of Actions. However, to implement those actions in the real fab’s physical envi- ronment involves timing for the Actions. As a consequence, considering today’s well accepted Discrete vehicle AMHS, the above policy statement cannot be instantaneously executed. In other words, A t = a ןS t = s cannot happen. For the assessment of system states, followed by an intelligent agent’s calcu- lations, and release of its control actions, a time period is required. Ideally, for the control action to be effective, its release should be made before the state of the system has changed on its own. A com- putationally intensive process (dependent on the complexity of the controlled state) Yet, it is likely that the states of the system in typical IC manufacturing will rapidly and unpredictably change, and in a stochastic fashion, considering a functional set of processes, or the whole. 50% of the changes occur in less than 5 minutes, while 80% of the changes occur in less than 15 minutes [1]. This considered, an iteration of the Reinforced learning cycle should be as short as possible. Today’s computer systems, collecting state matrix vectors from the system and solving algorithms of reinforced learning (ex. neural net- works), followed by a matrix of actions, may take tens of seconds to a minute. In general, appraising the parameters of semiconductor manufacturing process states and the issued actions to modify them may be computer fast. If, however, such machine learning is applied to wafer lot dispatching in semiconductor fabrication, then consideration must be made to the so-called wafer lot “moving agents”, i.e. the AMHS, in segments or as a whole, having compatible exe- cution capabilities. This, however, is not the case. Average delivery times of current AMHS designs (discrete vehicle transport) are in the several minutes, up to 15 minutes range, and are them- selves stochastically distributed. This, in general, frustrates the actions issued by the agent to various degrees, and is likely to result in the reduction or elimi- nation of rewards (FIGURE 2). Overall, the scatter of system states approaching the desired goal is adversely affected. An example instance of such a coun- terproductive Action maybe where the system state routinely changes in less than five minutes, while the Action corresponding to the original state of the system arrives with a delay of 10 minutes. The AMHS is the essential tool in delivering an Action from the Agent. Current use AMHS technology (discrete vehicle transport) depends on excess WIP accumulation in order for it to assure WIP availability at tool inputs. But, to improve the chances of success for Rein- forcement Learning Dispatch, the AMHS should be able to deliver WIP without delay. Thus, conveyor-based AMHS should be considered. Because of the all-time availability of conveyor transport at the output of a process, the WIP can directly move, without waiting times, to the next process, and thus become the buffer for it. About the author Mr. Horn has worked for many years towards understanding the roles AMHS can play in the manufacturing process (Cf. publications IEEE Transactions) and has pointed out ways to improve AMHS roles. Currently he is the pres- ident of Middlesex General Industries, Inc. A manufacturer of conveyor AMHS solutions. He can be reached at gwhorn@midsx.com REFERENCES 1. Operations Management in Automated Semiconductor Manufacturing with Integrated Targeting, Near Real Time Scheduling and Dispatching. Nirmal Govind et All, IEEE Transactions on Semiconductor Manufacturing, Vol. 21, No 3. 2. Autonomous Order Dispatching in the Semiconductor Industry Using Reinforcement Learning. Andreas Kuhnle et All, Elsevier B.V. 2019. 3. Towards Lean Front End IC Manufacturing (with AMHS Implants), George W Horn, IEEE Transactions on Semiconductor Manufacturing, 2022. 4. Deep Reinforcement Learning for Semiconductor Production Scheduling, Bernt Washneck, et All, GSaME, Universität Stuttgart, (Grant SemI40) and support by Infineon Technologies. 5. Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning, Cong Zhang et All, Singapore Institute of Manufacturing Technology, A*STAR. 34th Conference on Neural Information Processing (2020). Figure 2. The action of an Agent is based on the original state of the environment while the arrival of that action is delayed randomizing its benefit due to the altered state of the environment at the time of the action’s arrival.